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The possibility of using translations of American reading tests for the evaluation 
of pupils belonging to different foreign groups was explored. Two parallel forms of a 
reading comprehension test geared to United States high school graduates and 
college entrants and the translations of these into Turkish and the relative 
retranslations back into English were administered to five groups of high school and 
coHege students in the United States and Turkey. Item difficulty and frequency of 
responses to item errors were highly stable In the Iwo groups. There was great 
similarity in the total test scores of American and Turkish students at similar 
educational levels when the test was taken In their own language. Thjs seems to 
indicate that translated reading tests remain culturally fair if total test scores, relative 
difficulty of reading passages, and indices of item difficulty are criteria for test 
fairness. (WL) 
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There are different methods for constructing tests which are culture- 
fair to people from different backgrounds* The present writer classifies 
these methods into two broad categories, i«e«, content yalidlty, and 

f 

empirical validity* Content validity involves selection of test content 
from areas in which the test is going to be administered. Bnpirical 
validity is obtained by statistical evaluation of item and total test 
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scores regardless of the source of content* 

An example of the content validity method is in the Spanish version of 
the SAT in Puerto Rico where the test resembles the United States SAT in 
format but not in content (CEEB and IIE, 1965)* This approach has also 
been used in extensive adaptations of the Stanford-Blnet, the Wechsler, 
and the Otis such as reported by Kamat (193U), Pasricha and Pagedar (1963), 
and Wu (1936) * Another example of the content validity method is an 
international study conducted under the auspices of the UNESCO Institute 
for Education (Foshay et al«, 1962) where a battery of five tests was 
developed Jointly by representatives from the twelve countries participating 
in the study* Content for the tests was selected from sources in five of 
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the countries* This method assures that a student from one of the five 
countries will have at least 20 per cent cultuire-fair items in his test* 

However^ no statement can be made as to the culture-fairness of the 
remaining 80 per cent of the items on the test* 

An example of the empirical validity method is shown by Manuel ( 1961 , 1962 ) 
in the construction of the parallel English and Spanish versions of the 
Cooperative Inter-American Tests. The criteria used in item selection were 
indices of item difficulty and expression of the same thought with approximately 
the same number of words. Manuel obtained these data as a result of simultaneous 
tryouts within the two countries. This method is preferable to the older methods 
of comparing total scores which are influenced by sampling variations. However, 
this method is relatively expensive and time-consuming since original tests 
must be written in the two countries under comparison, laded out, and 
analyzed prior to the construction of the final tests* 

The present research was conducted to eagploare the possibility of using 
direct translations of a reading test for college-level students in different 
countries. By analyzing the changes that occur in a test after direct 
translation and administration in a different culture, one can gain a. 

I 

knowledge of variables which make a test culture-fair. If these changes, 
as shown by item analysis data, are minor one can use direct translations 
of the original test as opposed to (a) qualitative adaptations of content, • , 

(b) combinations of material from the cultxires in idiich the test is to be 
given, and (c) use of content from each country to make a test geared to 
that paidicular country. 
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Method 

Instruments 

The instruments central to the study were two parallel forma of a 
reading comprehension test appropriate for high School graduates and 
college entrants in the United States^ their versions translated into 
Turkish, and their versions re-translated back into English. Bach test 
consisted of 30 five-choice items based on five expository reading 
passages* The tests were developed at Teachers College, Columbia University, 
on the basis of a pilot tryout in three colleges in the United States; 

The Kuder-Richardson Porurola 20 reliability of the two forms averaged 
*78 for the En^ish versions, #73 for the translated Turkish versions for 
a sample of Turkish students with similar heterogeneity in ability, and 
*7U for the re-translated English versions. The original English versions 
A and B correlated .55 and 069 with SAT Verbal scores. 

Subjec'ts And Administration Procedures 

I 

The tests were administered to five groups of high school and college 

* 

students in the United States and Turkeys 

1. Seven hundred fouirteen seniors from nine Turkish secondary schools 
in Ankara were tested with two forms of the reading test, each foim in 
Turkish. Testing order was counterbalanced, half the students taking each 
test first. The sample of high schools was chosen through the assistance 
of the Turkish Ministry of Education,^' arid by referring to a report 
indicating the rank of students from the respective schools bn the 
University of Ankara Entrance Examinatiohs. 

2. Ninety-six first year Turkish students at the Middle Bast Technical 

f 

University in AriiAkara ccmprised the Turkish college sample, taking both foxms 
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of the reading tests in Turkish. Testing order was counterbalanced for 
this group also. 

3* A total of $6? American high school seniors, about 80 per cent 
of whom were in the academic curriculum, were tested in three high schools 
from^f^stem United States. The original and re-translated English versions 
T^re administered randomly among students in two of the schools; in the 
thirdj testing was conducted only with the original English versions. ' 

U. The American college sample consisted of 8l6 students from four 
colleges in^ eastern and southern United States. In three of the colleges 
the original English versions were administered randomly among students. 

The fovirth college participated in the administration of both the original 
end the re-translated versions. 

5. Eighty-six Turkish secondary school graduates and graduate stu<^f^g)ba 
receiving special instruction in English in preparation for studying in 

American universities were administered one form of the reading test in 

' % 

Turkish and the parallel form in En^ish. 
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Results 

► 

Total Scores 

✓ * 

Table 1 shows a cosnparlsoh of total scores for the Tarlous groups 

tested. As would bo observed in Part A of the table, the two forms of the 
tests in English and Turkish were approximately of the same difficulty for 
Anerican and Turkish samples in the same grade. Furthermore, the two 
alternate forms retained their comparable difficulty with translation and 
administration in a different culture. An analysis of variance showed 
these differences to be non-significant (P • 2.l»7f df 3/92$, In 

Part B of the table a comparison is shown between total scores on the 
original and re-translated English versions administered to American students. 
No striking shifts in the overall difficulty of the tests appeared as they 

« 

were translated into Turkish and re-translated into English ( F » 1.78, 
df 3A03# P;) #05) • As might be expected, for Turkish students studying 
English, English scores were much lower than scores in Turkish. Comparisons 

between English and Turkish scores are showni in Part C pf Table 1. For this 

* 

group also the alternate forms of the i^ading tests retained their comparable 
difficulty. 



Insert Table 1 about here 



Item Difficulty and Popularity of Errors 

Analyses relating to the stability of responses to specific items 
and item options within- and between-countries are shown in Table 2. 



Insert Table 2 about here 
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To obtain the with-country reliability, the data for the Turkish 
high school and American college groups were allocated into two subsamplea 
equated on the basis of total score* Then, item difficulty indices 
(percentage of students answering each item correctly) and popularity of 
each error (obtained by adjusting percentage remaining after choice of 
the right option to 100 and by considering the popularity of the four 
remaining options as percentage of this total) were correlated for 
subsamples of each country* 

It may be observed Part A of Table 2 that bo1*i item difficulty 
and frequency of z*esponses to item errors wore highly stable within 
the United States and Turkey* The stability of z^sponses to errors was«- 
somewhat lower than that of difficulty, which was almost unity in both 
•countries. 

Since the different groups in Part B of Table 2 varied in size, in 

* 

determining the correlations of item difficulty and popularity of errors 
between the various groups, a correction for saidplo size was introduced 
using the Spearman-Brown formula and a correction for attenuation* 

It is interesting to observe that all correlations dealing with Item 

difficulty, i*e., responses to the right options,* were reasonably high 

aUhough they were not as high as the within-country reliability of this 

index. As would be expected, the average correlation of , 69 , indicating* 

the case where both translation effects and cultural differences intervene, 

was the lowest* Highest was the correlation between the original and 

* 

re-transla'ted versions administered within the American culture* 
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On the other hand, results based on comparison of popularity of errors 
diverged considerably from the profile of correct responses. The correlation 
between errors ±n the original and re- translated tests administered to 
Americans was: considerably higher than those for tests administered to 
different cultural groups.. The language in which the test was administered 
was evidently not influential, as may be observed in the correlations of 
•1*0 and .37 which are quite siinilar. Of course, it must be ifemembered that . 

the Turkish students tested in English were working under language handicap 

« 

and the results probably reflect lack of proficiency in the language as 
well as cultural difference, 

I 

Item Discrimination , 

Table 3 shows the within-and between«country stability of item 

discrimination indices, i.e., point biserial correlation coefficients 

^ * 

of each item with total score corrected for the inclusion of that item 
in the total score. The within-country reliability of the index has been 
obtained in the same manner as for item difficulty and errors shown in 
Table 2, 



• Insert Table 3 about here • 



The contrast between results based on difficulty and .discrimination 
indices may be observed by a comparison of Tables 2 and 3. First, the. 
within-country reliabilities of discrimination indices were considerably 
lower than those for difficulty and errors. The former ware in the ,60s, 

the latter in the ,90s,.a The within-country reliability of discrimination 

/ 

was, however, hi^er in Turkey than in the United States, 
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More interesting is the almost negligible relationship between countries 
in item discrimination power as reflected in the average coxrelation of •l^* 

This contrast seems especially intriguing in view of the stability of item 
difficulty and discrimination characteristics ^thin each of the two countries 
and the relative cross-c\atural homogeneity of item difficulty. 

Table k shows an analysis of the relationship between indices of difficulty 
and discrimination within the United States as compared to Turkey. While 
there seems to be no correspondence between difficulty and discrimination 
within the United States, as shown by the average correlation of .03 for 
the two forms of the tests, in Turkey easier items had hi^er discrimination# 
an average correlation of .U7* 

MW (ww ww mm mm , ^ 

Insert Table U about here 



Reading Passages 

A general comparison of the extent to which groups of items based on 
a specific reading passage retained their relative difficulty when- administered 
to Turkish students in the vernacular is shown in Table 5* The rank correlation 
of #92 of reading passage difficulties implies that the difficulty of items 

♦ I 

was not determined by the nature of the reading passage on which they were 
based. This correlation may also imply that the content of the reading 
passages retained their difficulty in the Turkish culture. It should be 
admitted, however, that the rank correlation was based on only a small number 

of cases. 



Insert Table 5 about here 
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Discussion 

Translations of reading tests seem to be relatively culture-fair » * 

measures if total test scores, relative difficulty of reading passages, 

0 

and indices of item difficulty are considered as creteria for test 
fairness* Remarkable similarity was shown in the total s cores of American 
and Turkish students at similar educational levels when they took the testa 
in their own language* Reading passages also retained their relative 
difficulty* Although the *69 correlation between item difficulties was 
somewhat lower than the correlations ranging from .80 to .98 obtained in 

rJ' 

the study conducted under the auspices of the UNESCO Institute for Education 

(Foshay et al*, 1962), it may be remembered that the tests used in the 

UNESCO study were developed jointly by representatives from various countries, 

* % 

selecting items and passages from different national sources. Prom a theoretical 
point of view, the present study may imply that careful translations of college 
level reading tests raeasure what Holmes (19SU) called the ability to infer 
from a text, a process ^ich is relatively unifom throughout cultures. 

The findings in the present study may perhaps have implications for 

% i 

the growing necessity for, and interest in, a better assessment of foreign 
student aptitude* Since American screening devices such as the GEBB and the 
GRB are not in the vernacular of foreign students, scores do not reveal 
the differential effects of language proficiency and academic aptitude. The 
SAT Verbal score in English has very little*' predictive icalidity for foreign 
student academic achievement, as stated in a report of workshops sponsored 
by the CEBB and 1IB*(1962). English proficiency measures such as the 
English Composition Test and the Test of En^ish as a Foreign Xiinguage used 



as suppleraantary devices are not good predictors of college level achievement 
for foreign students (AUen, 1965j Kaplan and Jones, 1965). Thus testing 
with a pair of equivalent foms of reading tests, one in En^sh and a 
translated version in the native language of the examinee, might provide 
a powerful diagnostic tool for identifying (a) the individual's potential 
for higher education and (b) the extent to which this potential is depressed 
as he is faced with the necessity of shifting frtm his native language to 

English as a language of instruction. 

The countries represented by the foreign students in the United States 

are so numerous that constructing pairs of equivalent tests for each country 
and the United States on the basis of pilot tryouts illustrated by Manuel 
(1961, 1962) may result in an unduly elaborate enterprise. Since shifts in ^ 
item difficulty, reading passage difficulty, and total scores do not seem 
significant, the possibility of translating an American reading test into ; | 
the vernacular of each group of foreign students mi^t be explored. These 
translated tests might per^s be accepted as measures parallel to their - 

English versions for foreign students. 

m terms of psychometric data concerning 4e interrelationship between 

different item statistics, three general conclusions m^ be reached j 

First, item difficulty, as reflected in responses to the right option, 
is much more stable across cultures than are responses to the wrong options 
or errors. This could perhaps mean that in cases where the text is not 
read carefully, students resort to answering the item on the basis of general 
knowledge and, in turn, conclude with the wrong answer. Curricular emphases 
may produce differential familiarity In specific areas of content, resulting 
in difference In general knowledge. A supplementary study was conducted 
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after the temiinatien of the testing as a follow-up of this idea, American 
graduate j students were asked to answer the questions in the English reading 
tests without reading the passages on which they were based. Item difficulties 
and responses to the wrong options were correlated for the tests administered 
with the reading passage "and without reference to the passage. Responses to . 
the right options correlated low in the ,20s; responses to errors correlated 
about ,50. Considering the fact that wrong* option choices are somewhat 
influenced by response to the ri^lSpption, the latter correlation seems quite 
high and implies that general knowledge partly determines wrong option choices. 
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Second, difficulty and discrimination data used in cross-cultural compa3PtSf^R3^^^^^^,^ 

• ».•’ •’ ,'w ,7 ' ' V .»v 

reveal opposing results, the former being quite stable, the latter showing ' 
large shifts, ' 
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Third, the relationship between item easiness and better discrimination 

in Turkey but not in the United States may be related to the fact that the 

test was originally developed in English and for an American culture. When 

a test of this kind is administered in a different culture, items which are 

culturally loaded probably appear to be more difficult. In turn, other 

psychometric functions such as indices of discrimination may be affected, 

1 , 

producing a positive relationship between difficulty and discriinination, 

A somewhat surprising aspect of the study is that, a verbal test, especially 
a reading test, yielded promising data in terms of culture-fairness, Theoretici^s 
such as Whorf (1958) hypothesize that cognition and thought is molded by the 
linguistic specifications of a particular culture, thereby implying that the 
difficulty of any concept cannot be predicted directly from another language. 
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The existing international aptitude scales such as the leiter International 
Performance Scale and the Progressive Matrices Test are non-verbal. However, 
both in the United States and abroad, non-verbal tests have been found to 
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correlate lo^r with acadeinlc achievement than verbal tests (MacArthur 
and Elley, 1963; Bolton, 19U7; Keehn and Protho, 1955)* It is hoped that 
the present research may prompt future investigation in the use of reading 
tests as international aptitude devices* 

Summary 

Comparisons of different psychometric criteria were made on two 
parallel fontis of American college level reading tests, their versions 

I 

translated into Turkish, and their versions re-translated from Turkish into 
English* In terms of total test scores, difficulty of reading passages, 
and Indices of item difficulty, the tests yielded relatively consistent 
results with translation and administration in a different culture* Responses 
to the wrong options of an item tended to be based on general knowledge more 
than responses to ri^t options* The sharpness of discrimination of items 
showed a negligible relationship between the two cultxires, although within 

I , 

each culture it showed considerable stability* Item difficulty and 
discrimination correlated low within the United States, but significantly 
within Turkey* 
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Table 1 

Comparisons Based on Total Scores 



Part A: Tests Administered in the Vernacular 

Form A Form B 

U.S. Turkey U.S. Turkey 



College 


Mean 


19.66 


18.70 


19.20 


18.35 


S.D. 


U.68 


1(.36 


5.1U 


U.lU 




N 


381 


96 


356 

* •• 


96 


H.S, 


Mean 


ll(.61( 


15.71 


'11(.58 


15.0U 




S.D. 


6.1(7 


3.61 


6.39 


3.6U 






19U 


711; 


20U 


71U 




Part B 


; Original and Re-Translated Versions Administered 



Td.thin the United States 

Form A Form B 

N 

Original Re-Translated Original Re-Translated 






Mean l6.?8 15.65 
S.D. 5.15 ll.52 
N 112 106 



17.02 16.71 
5.Uj U.90 
106 83 



Part 0: 
Turkish 



English-Turkish 
Form A 
English 



Versions Administered to Turkish Students 

Form B 

Turkish English 



Mean 18.2li U.UO 

S.D. 3.82 lt.5l 

N 1(1 U5 



17.1(2 U.91 

' 3.69 l(.l(3 

1(5 la 






Table 2 



Correlation of Item Difficulty and Popularity of Errors 
within- and between-Countries 
(N ■ 30 items) 





Form A 


Form B 




Average for Fonns 


ft 

4 


Diffic# 


Errors 


Diffic. 


Errors 


Diffic. 


Errors 


Ax Within-Country 
Reliability^ 








• 


• 


• 


U.S. 


.98 


.87 


.97 


.92 


.98 


.90 


Turkey 


.98 


.9U . 


.98 . 


.95 


.98 


,9h 


B: Correlations 




A 










b 

U#S .-Turkey 
.S # and college 
students tested in 
the vernacular) 


.67 


.U9 


.71 


1 

.32 




.1*0 


U.S .-Turkey® 
(American and Turkish 
students tested in 
English) 


.83 


.30 


.77 


1 


. .80 


.3? • ; 


Orig .-Re-Tr4nslated^ 
(American students) 

i 

— --i- 


.9$ 


.70 


.77 

• 


.7U 


.86 

t 


.72 • • 



^Correlation within subsamples# 



^Average of correlation across subsamples of the two countries# 
corrected for attenuation using the within-country reliabi?lity* 
^Corrected by Spearman-Brown formula and for attenuation# 
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Table 3 

Correlation of Item Discrimination Indices 
within- and between-Countries 
(N " 30 items) 



1 


j 

Form A 


Form B 


« . ... ^ . W 1 

A; Within-Country 
Reliability^ 






U.S. 


.56 


.U7. 


Turkey 


.80 


mix 

1 


B: Correlations 




* 


U.S .-Turkey^ 
(H.S. and college 
students tested in 
the vernacular) 


.10 


, .20' ■ 

i 

» 

\ 



I ^erage for Forms 

I 

i 

5 

1 

t 

.^2 

.66 



^Correlation within subsamples. 

\verage of correlation, across subsamples of the two countries 
corrected for attenuation^ using the within-countiy reliability. 








I 





Table U 



Within-Country Correlations between Difficulty and Discrimination Indices 

(N » 30 items) 



# ‘ * 

* 


Fozm A 


Form B 


u.s. 


.19 


1 

• 

ro 


Turkey 


• 6 U 

1 


.30 






Average for Forms 



.03 

.hi 
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Table 5 

Difficulty of Reading Passages 

♦ ^ 

for American College and Turkish High School Samples 



t 

Reading U.S. Turkey 



Passage 


Difficulty 


Rank 


Difficulty 


Rank 


Form A 






• 




1 


80 


1 


72 


1 


2 


73 


2 


62 


2 


3 


72 


3.5 . 


50 


5.0 


h 


■ 


P 


38 


9.5 


5 


U8 


10 


38 


9.5 


Form B 






• 




1 


61+ 


6 


U6 


7 


2 


^8 


8 


1*8 

t 


6 


3 


66 


5 


58 


3 


k 


72 


3.5 


57 


U 


5 


60 . 


7 


Ul 


8 



.92 














